CARS: A New Code Generation Framework for Clustered ILP Processors
نویسندگان
چکیده
Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approaches need additional re-scheduling phases because they often do not impose finite resource constraints in all phases of code generation. These phase-ordered solutions have several drawbacks, resulting in the generation of poor performance code. Moreover, the iterative/back-tracking algorithms used in some of these schemes have large running times. In this paper we present CARS, a code generation framework for Clustered ILP processors, which combines the cluster assignment, register allocation, and instruction scheduling phases into a single code generation phase, thereby eliminating the problems associated with phase-ordered solutions. The CARS algorithm explicitly takes into account all the resource constraints at each cluster scheduling step to reduce spilling and to avoid iterative re-scheduling steps. We also present a new on-the-fly register allocation scheme developed for CARS. We describe an implementation of the proposed code generation framework and the results of a performance evaluation study using the SPEC95/2000 and MediaBench benchmarks.
منابع مشابه
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into onchip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster an...
متن کاملA Partitioned Register File Architecture and Compilation Scheme 3 COMN
In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register le and cache are either partitioned or replicated and then grouped together into on-chip clusters. We present a novel partitioned register le architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and c...
متن کاملImpact of ILP-improving Code Transformations on Loop Buffer Energy
For multimedia applications, loop buffering is an efficient mechanism to reduce the power in the instruction memory of embedded processors. In particular, software controlled clustered loop buffers are very energy efficient. However code transformations needed in VLIW compilers to reach a higher ILP potentially may have a large negative influence on the energy consumed in the instruction memori...
متن کاملILP-based Approximations for Retargetable Code Optimization
Embedded systems are characterized by high performance requirements and are subject to severe cost and power consumption restrictions. This has led to the development of specialized irregular hardware architectures for which traditional code generation and optimization techniques usually fail to generate machine code of satisfactory quality. The PROPAN system has been developed as a framework f...
متن کاملOptimal Integrated VLIW Code Generation with Integer Linear Programming
We give an Integer Linear Programming (ILP) solution that fully integrates all steps of code generation, i.e. instruction selection, register allocation and instruction scheduling, on the basic block level for VLIW processors. In earlier work, we contributed a dynamic programming (DP) based method for optimal integrated code generation, implemented in our retargetable code generator OPTIMIST. I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001